Goodness-of-Fit Test for a Single Categorical Variable
Goodness-of-Fit tests test whether or not a categorical variable fits a particular multinomial probability distribution. A multinomial probability distribution gives the probabilities for three or more categorical outcomes of a given variable. The multinomial probability distribution for a variable with \(\text{k\ }\)categorical outcomes is:
\[p_1 = P_1 = probability\ of\ an\ outcome\ being\ in\ category\ 1 \\ p_2 = P_2 = probability\ of\ an\ outcome\ being\ in\ category\ 2 \\ \ldots \\ p_k = P_k = probability\ of\ an\ outcome\ being\ in\ category\ k\]
where
\[P_1,P_2,\ldots,P_k\ are\ probabilities\ between\ 0\ and\ 1\]
\[k = the\ number\ of\ categories\ in\ the\ variable\]
| Hypotheses for Goodness-of-Fit Tests |
|---|
\[H_{0}:The\ probability\ distribution\ is\ \text{p}_{\mathbf{1}} = P_{1},\ \tex t{p}_{\mathbf{2}} = P_{2},\ \ldots,\ \text{p}_{\mathbf{k}} = P_{k}\] \[H_{A}:The\ prob ability\ distribution\ is\ not\ \text{p}_{\mathbf{1}} = P_{1},\ \tex t{p}_{\mathbf{2}} = P_{2},\ \ldots,\ \text{p}_{\mathbf{k}} = P_{k}\] |
| Answers questions about: |
If the population distribution of a given categorical variable fits \(\text{p}_{\mathbf{1}} = P_{1},\ \te xt{p}_{\mathbf{2}} = P_{2},\ \ldots,\ \text{p}_{\mathbf{k}} = P_{k}\) or not. |
NOTE:
|
Here is the idea behind Goodness of Fit tests: if the null hypothesis is true, then the multinomial probability distribution in \(H_{0}\ \)truly describes the population. This means that the observed frequencies from the sample should be the same as (or very close to) the expected frequencies calculated from the \(H_{0}\ \)probability distribution. (For example, if 25% of the population fits into Category A, then about 25% of the sample should fit into Category A.) Large differences between the observed frequencies and the expected frequencies are evidence that the underlying population is NOT distributed according to the hypothetical multinomial probability distribution.
Some variation between observed and expected frequencies will occur due to random chance, so we need to use our hypothesis testing techniques to see if the differences we see are statistically significant. The \(\text{α\ significance\ level}\) sets a threshold at which we determine the sample distribution to be different enough from what we would expect under the null hypothesis, to contradict the null hypothesis.
The size of the difference between the observed and expected frequencies is quantified in our \(\chi^{2}\ \)test statistic. When the test statistic is large enough to reject the null, we can conclude that the population is not distributed according to the hypothetical multinomial probability distribution.
The test statistic for a Goodness-of-Fit Test is:
\[\chi_{test}^{2} = \sum_{i = 1}^{k}\frac{\left(f_{i} - e_{i}\right)^{2}}{e_{i}}\]
\[where\]
\[e_i = nP_i = the\ expected\ frequency\ for\ category\ i \\ f_i = the\ observed\ sample\ frequency\ for\ category\ i \\ k = the\ number\ of\ categories \\ n = sample\ size\]
\[and\]
\[the\ degrees\ of\ freedom\ are\ df = k - 1\]
Only large discrepancies between the observed and expected frequencies constitute evidence against the null hypothesis, so all Goodness-of-Fit tests are upper tail tests. (If the test statistic was in the lower tail, that would mean there were only small discrepancies between observed and expected).
When to Reject \(H_0\) in a Goodness-of-Fit Test
+==================================+==================================+ | | Always an Upper Tail Test: | +———————————-+———————————-+ | p-value approach: | Calculate the upper tail | | | \(p-value\) of | | | \(\chi_{test}^{2}\) | | | | | | If the | | | \(p-value \leq \alpha\), | | | then reject \(H_{0}\) and accept | | | \(H_{A}\). | | | | | | If the | | | \(p-value > \alpha\), then | | | do not reject \(H_{0}\). \(H_{A}\) | | | is unsupported. | +———————————-+———————————-+ | Critical Value: Approach | Look up the UT Critical Value of | | | \(\chi^{2}\), which is | | | \(\chi_{\alpha,UT}^{2}\) | | | | | | If | | | \(\chi_{test}^{2} \geq \chi_{\alpha,UT}^{2}\) | | | , $then | | | reject \(H_{0}\)and accept | | | \(H_{A}\). | | | | | | If | | | \(\chi_{\text{tes | | | t}}^{2} < \chi_{\alpha,UT}^{2}\), | | | then do not reject | | | \(H_{0}\text{.\ }H_{A}\) is | | | unsupported. | +———————————-+———————————-+ | NOTES: | | | | | | 1) \(\chi_{\text{test}}^{2}\ \)is | | | a Test Statistic | | | | | | 2) \(\chi_{\alpha,UT}^{2}\) is an | | | Upper Tail Critical Value | | | | | | 3) \(\chi^{2}\) is based on | | | degrees of freedom. The | | | degrees of freedom in | | | Goodness-of-Fit Tests is | | | | | | \[df = k - 1\ where\ k\ i | | | s\ the\ number\ of\ categories\] | | +———————————-+———————————-+
(Note: This explanation of interpretation holds for ALL hypothesis tests.) We start every hypothesis test with a question about the parameter of interest, so we must end every hypothesis test with the answer to that question. In other words, we must interpret the conclusion of our test in terms of the original question.
Remember: in hypothesis testing you can never prove the null hypothesis. You can only prove the alternative hypothesis: when you reject the null and accept the alternative, then at your given \(\alpha\) level of significance you may conclude that \(H_{A}\) is true. If you do not reject \(H_{0},\ \)then you must conclude that \(H_{A}\) is unsupported by the evidence. This gives us a clear guideline for how to interpret hypothesis tests: always look to the alternative hypothesis!
In all that follows, you must substitute the actual words and numbers from your hypothesis test for the symbols. Notice that each interpretation simply states the alternative hypothesis in words, and says either that we can or cannot conclude that it is true.
| How to Interpret a Goodness-of-Fit Test: | |
|---|---|
| When you: | Interpretation: |
| Reject \(\mathbf{H}_{\mathbf{0}}\) | At the \(\text{α\ }\)significance level, we can conclude that the probability distribution of \[*the variable*\] is not \(\text{p}_{1} = P_{1},\ \text{p}_{2} = P_{2},\ \ldots,\ \text{p}_{k} = P_{k}\). |
| Do not reject \(\mathbf{H}_{\mathbf{0}}\) | At the \(\text{α\ }\)significance level, we cannot conclude that the probability distribution of \[*the variable*\] is different from \(\text{p}_{1} = P_{1},\ \text{p}_{2} = P_{2},\ \ldots,\ \text{p}_{k} = P_{k}\). |
NOTES:
|
Assumptions Underlying These Hypothesis Tests
All hypothesis tests use sampling distributions to determine the probability of sample statistics. In order for us to be confident that our choice of sampling distribution for any given test really is the way the sample statistic is distributed, certain assumptions must be met. If the assumptions are not met – that is, if any given assumption is not true – then we cannot rely on the results of the hypothesis tests. They may mislead us, give us the wrong answers, and cause us to draw the wrong conclusions.
For Goodness-of-Fit Tests, the only assumption that must be satisfied is that the expected frequency for each category must be greater than five:
\[e_{i} \geq 5\]
Exercise: A bike shop sells bikes with frames made from four different materials: carbon fiber, aluminum, steel, and titanium. In the past, the share of sales by type of frame was 17% carbon fiber, 53% aluminum, 12% steel, and 18% titanium. The bike shop decided to use a Goodness-of-Fit test at the \(\alpha = 0.05}\) significance level to see whether the probability of sales by frame type has changed from what it was in the past. In a random sample of 250 bike sales, the following frequencies were observed:
| Category | Frame Type | Number of bikes sold |
|---|---|---|
| 1 | Carbon Fiber | 66 |
| 2 | Aluminum | 128 |
| 3 | Steel | 16 |
| 4 | Titanium | 40 |
Start by formulating the null and alternative hypotheses:
From the null hypothesis, fill in the Hypothesized Probability column.
From the sample data, fill in the Observed Frequency column.
For each row, the Expected Frequency is \(e_{i} = n\left( P_{\text{i\ }} \right).\)
| Frame ma terial | Ca tegory | Hypoth esized Proba bility |
Observed Fre quency | Expected Fre quency | Di fference Sq uared/ Expected Fre quency |
|---|---|---|---|---|---|
| \[(i)\] | \[\left( P_{i} \ right)\] | \[\left( f_{i} \ right)\] | \[\left( e_{i} \ right)\] | \[\ frac{{(f _{i}\ - \ e_{i} )}^{2}}{ e_{i}}\] | |
| Carbon Fiber | 1 | ||||
| Aluminum | 2 | ||||
| Steel | 3 | ||||
| Titanium | 4 | ||||
| \[\sum_ {}^{}\ma thbf{f}_ {\mathbf {i}}\mat hbf{=}\] | \[ \chi_{\t ext{test }}^{2} = \ \sum_ {i = 1}^ {4}{\fra c{{(f_{i }\ - \ e_{i})}^ {2}}{e_{ i}} =}\] |
The sum of the values in the last column is the \(\chi^{2}\) test statistic. The test statistic and the critical value have k – 1 degrees of freedom, where k is the number of categories.
Now you are ready to complete the hypothesis test.
(left blank for hypothesis test)
Because we have determined that the distribution has changed from what it was in the past, the bike shop now wants more details. Which categories have seen the biggest changes? What adjustments should be made to inventory?
\[CHART\]